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Abstract. Some recent methods for lossy signal and image compression store 
only a few selected pixels and fill in the missing structures by inpainting with 
a partial differential equation (PDE). Suitable operators include the Laplacian, 
the biharmonic operator, and edge-enhancing anisotropic diffusion (EED). The 
quality of such approaches depends substantially on the selection of the data that is 
kept. Optimising this data in the domain and codomain gives rise to challenging 
mathematical problems that shall be addressed in our work. 

In the ID case, we prove results that provide insights into the difficulty of this 
problem, and we give evidence that a splitting into spatial and tonal (i.e. function 
value) optimisation does hardly deteriorate the results. In the 2D setting, we present 
generic algorithms that achieve a high reconstruction quality even if the specihed 
data is very sparse. To optimise the spatial data, we use a probabilistic sparsification, 
followed by a nonlocal pixel exchange that avoids getting trapped in bad local optima. 
After this spatial optimisation we perform a tonal optimisation that modihes the 
function values in order to reduce the global reconstruction error. For homogeneous 
diffusion inpainting, this comes down to a least squares problem for which we prove 
that it has a unique solution. We demonstrate that it can be found efficiently with a 
gradient descent approach that is accelerated with fast explicit diffusion (FED) cycles. 
Our framework allows to specify the desired density of the inpainting mask a priori. 
Moreover, is more generic than other data optimisation approaches for the sparse 
inpainting problem, since it can also be extended to nonlinear inpainting operators 
such as EED. This is exploited to achieve reconstructions with state-of-the-art 
quality. 

Apart from these specific contributions, we also give an extensive literature survey 
on PDE-based image compression methods. 
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1 Introduction 

One of the most fascinating properties of variational methods and partial 
differential eqnations (PDE’s) in image analysis is their property to hll in 
missing data. This hlling-in effect has a long tradition: It can be fonnd 
already in the seminal optic flow paper of Horn and Schunck [BT], where the 
smoothness term propagates information to regions in which the data term 
is small or even vanishes. More explicit interpolation ideas are exploited in 
so-called inpainting approaches. They go back to Masnon and Morel [Bl], 
became popnlar by the work of Beltalmio et al. [H] , and have been modihed 
and extended by many others snch as PEI El EDI 1131]. Inpainting problems 
arise when certain parts of the image domain are degraded in such a way that 
any useful data is unavailable. In the remaining parts of the image domain, 
the data is available and undegraded. The goal is to reconstruct the missing 
information by solving a suitable boundary value problem where the data in 
the undegraded regions serve as Dirichlet boundary conditions. 

PDE-based image compression methods constitute a challenging applica¬ 
tion area of inpainting ideas; see Section for a detailed review and many 




references. These lossy compression approaches drive inpainting to the ex¬ 
treme: They store only a very small fraction of the data and inpaint the 
missing data with a suitable differential operator. However, PDE-based com¬ 
pression differs fundamentally from classical inpainting by the fact that it has 
the liberty to select the data that is kept. Typically one specihes only that a 
certain fraction of the image data is stored, and the codecj^can optimise this 
data in order to minimise the reconstruction error. Obviously this involves a 
spatial optimisation step that selects the locations of the kept pixels. We call 
these locations the inpainting mask. Interestingly, it may also be benehcial to 
optimise the grey values of the selected pixels: While changing the grey values 
deteriorates the approximation quality inside the mask, it can improve the 
approximation in the (usually much larger) inpainting domain where no data 
is specified. This grey value optimisation is also called tonal optimisation. 

In order to judge the potential of PDE-based compression approaches, 
it is important to go to the extreme and hnd the limits that inpainting 
with optimised sparse data can offer. This should include both spatial and 
tonal optimisation for different inpainting operators. In a hrst step, it can 
make sense to have a clear focus on quality and postpone considerations of 
computational efficiency and coding costs to later work when the quality 
questions are answered in a satisfactory way. 

Eollowing this philosophy, the goal of our contribution is to explore the 
potential of PDE-based inpainting of sparse data that can be optimised both 
spatially and tonally. In order to evaluate different inpainting operators in 
a fair way, we aim at algorithms that are as generic and widely applicable 
as possible, and we optimise quality rather than speed. Computation time 
becomes only relevant for us when different methods lead to identical results. 
Eirst our insights and algorithms are derived in more restricted settings. 
Afterwards, we investigate how these ideas can be extended and generalised. 

Our work is organised as follows. We start by presenting a survey on PDE- 
based image compression in Section Afterwards we discuss the homogeneous 
diffusion inpainting framework that is used for most of our optimisation 
algorithms in this work. In Section]^ we present methods that aim at optimal 
spatial and greyvahie data for homogeneous diffusion interpolation in ID. The 
subsequent Sectiondeals with optimisation strategies in 2D. In Section]^ we 
present extensions to higher order and nonlinear diffusion inpainting operators. 
Einally we conclude with a summary in Section 

Our discussions are based on a conference paper EH, in which we have 
introduced probabilistic sparsihcation and tonal optimisation for homogeneous 
diffusion inpainting. Here we extend these results in a number of ways. Our 
main innovations are: 

1. We present a new section that gives a detailed review of PDE-based image 

compression methods. Since this area has developped in a very fruitful 
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A codec is a system for image coding and decoding. 



way and no review article is available so far, we feel that snch a review 
section can be a nseful starting point for readers who would like to learn 
more on this topic. 

2. In another new section we derive a mathematically sound approach to 
solve the data optimisation problem for homogeneoues diffusion inpainting 
of strictly convex signals in ID. This restricted ID setting allows us to 
gain insights into the nonconvexity of the problem and to quantify the 
errors that are caused by separating spatial and tonal optimisation. 

3. We introduce a new algorithm for tonal optimisation, based on a gradient 
descent approach with an acceleration by a fast explicit diffusion (FED) 
strategy. While it achieves the same quality as previous approaches, we 
show that it is more efficient. 

4. We explain how to extend our methods to more advanced linear or non¬ 
linear inpainting operators such as biharmonic inpainting and inpainting 
by edge-enhancing anisotropic diffusion. In this way we achieve sparse 
PDE-based reconstructions of hitherto unparalleled quality. 

2 A Review of PDE-Based Image Compression 

Before describing our approach on spatial and tonal data optimisation in 
the forthcoming sections, we hrst review existing approaches to PDE-based 
image compression. These methods belong to the class of lossy compression 
techniques. Hence, they aim at hnding very compact hie representations such 
that the resulting reconstructions approximate the original image well. 

PDE-based appoaches are alternatives to established transform-based 
codecs such as the JPEG standard m which uses the discrete cosine 
transform (DCT), or its successor JPEG 2000 |126j that involves the wavelet 
transform. However, since they follow a completely different concept, it is 
worthwhile to give an overview of their various aspects. 

To this end, we review data optimisation questions, the choice of appropri¬ 
ate inpainting operators, and strategies to store the optimised data efficiently. 
Afterwards we describe PDE-based codecs that are based on specihc image 
features, and we discuss algorithmic aspects, variants and extensions, as well 
as relations to other approaches. 

Our review focuses on methods that use PDE’s as a main tool for com¬ 
pression. This means that we do not discuss the numerous PDE-based or 
variational techniques that have been advocated as a preprocessing step before 
coding images or videos (see e.g. [7H1II2H]) or as a postprocessing tool for 
removing coding artifacts (such as [31 HSl [H2]). 

2.1 Data Optimisation 

Galic et al. [331IM] have pioneered research on data selection methods for 
PDE-based image compression by proposing a subdivision strategy that in¬ 
serts mask points at locations where the approximation error is too large. 







While these authors used a triangular subdivision, Schmaltz et al. ma modi¬ 
fied this concept to rectangular subdivisions and added several algorithmic 
improvements that allowed them to beat the quality of JPEG 2000 with 
an anisotropic diffusion operator. Belhachmi et al. [S] have used the theory 
of shape optimisation to derive analytic results for the spatial optimisation 
problem in the case of homogeneous diffusion inpainting. Discrete approaches 
for the spatial and tonal optimisation problem with homogeneous diffusion 
inpainting have been presented by Mainberger et al. IS]. The spatial op¬ 
timisation is based on a probabilistic sparsification strategy with nonlocal 
pixel exchange, while tonal optisation is formulated as a linear least squares 
problem. Hoeltgen et al. considered a control theoretic approach to the 
problem of data optimisation, also for homogeneous diffusion inpainting. They 
minimised the quadratic reconstruction error with a sparsity prior on the 
mask and a nonconvex inpainting constraint. Their numerical approach solves 
a series of convex optimisation problems with linear constraints. A similar 
constrained optimisation model was considered by Ochs et al. nnn who used 
this problem as a test scenario for their iPiano algorithm. Qualitatively both 
techniques gave similar results, but the iPiano approach offered advantages 
in terms of efficiency. 

While all the before mentioned strategies pay much attention to spatial 
optimisation, less work has been devoted to tonal optimisation so far. Early 
heuristic attempts go back to Galic et al. |S1]. They lifted the grey values up 
or down to the next quantisation levels in order to improve the approximation 
quality in the vicinity of the data. A first systematic treatment in terms of a 
least squares optimisation problem was given by Mainberger et al. ED, who 
used a randomised Gaufi-Seidel algorithm. Faster numerical algorithms for 
the same problem have been proposed by Ghen et al. [21], who applied the 
L-BFGS method, and by Hoeltgen and Weickert [M], who advocated the 
LSQR algorithm and a primal-dual technique. 

2.2 Finding Good Inpainting Operators 

Although many theoretical investigations on data selection methods are based 
on homogeneous diffusion inpainting, the task of finding better inpainting 
operators has been a research topic for a decade. Already in 2005, Galic et 
al. [53] have shown that edge-enhancing anisotropic diffusion (EED) jl31j is 
better suited for PDE-based image compression than homogeneous diffusion. 
The favourable performance of anisotropic diffusion approaches of EED 
type has been confirmed by more detailed comparisons later on, both for 
randomly scattered data [MIHl] and for compression-optimised data [2!J[ 1118] . 
Some of these evaluations involve many other inpainting operators such as 
biharmonic interpolation [33] and its triharmonic counterpart, the absolute 
minimal Lipschitz extension [22], isotropic nonlinear diffusion [231 EDS] and 
its approximations of total variation (TV) inpainting [231 EH], as well as 













the inpainting operators of Bornemann and Marz US and of Tschnmperle 
imi^ Total variation inpainting performed consistently bad, showing that 
operators which work fairly well for denoising are not necessarily good for 
sparse inpainting. Bihamonic inpainting, on the other hand, tnrned ont to 
be an interesting alternative to BED, when linearity or absence of any need 
for parameter specihcation are important, and over- and undershoots are 
acceptable. 

2.3 Storing the Data 

Besides hnding sparse inpainting data that allow to approximate the original 
image in high quality, every practically useful codec must be able to store 
these data efficiently. Attempting this in a naive way would create a huge 
overhead, and the resulting codec would not be competitive to the quality 
that established transform-based codecs such as JPEG or JPEG 2000 can 
achieve for the same hie size. The search for data that can be stored efhciently 
may even lead to compromises w.r.t. the data optimality: In order to use less 
bits, it is common to round the intensity values to a relatively small set of 
quantisation levels. Regarding the data localisation, it can be helpful to avoid 
a completely free conhguration of data points and allow only masks which are 
represented efhciently with binary trees that result from subdivision strategies 
|S3l El [IIH]. Moreover, also lossless entropy coders such as Huffman coding 
|SSj, JBIG |72], or PAQ jS2] are highly useful to remove the redundancy of 
the data. 

2.4 Feature-based Methods 

Methods that rely on specihc - often perceptually relevant - features rather 
than on data that are optimised for compression applications can be seen as 
predecessors or variants of PDE-based codecs. Let us now discuss the main 
types of such features and their relevance for coding. 

Contour-based Features Edges are perceptually relevant features that 
have been analysed for a long time. Already in 1935, Werner m has 
investigated hlling-in mechanisms from edges in biological vision systems. In 
computer vision, there are numerous appoaches in scale-space and wavelet 
theory that attempt to reconstruct an image from its edge information 
[2HI El in SSI El n n n msi mi, often over multiple scales and 
supplemented with additional information. 

Also within coding, publications that exploit information from edges or 
segment boundaries and combine it with inpainting processes have a long 
tradition P El III El EZl El ESI El ESI EDI ESI EZl IMl IISZI lISS] • However, for 
general images these features are often suboptimal as inpainting data. On the 
other hand, for specihc applications where the images are piecewise constant or 












piecewise smooth, such codecs can achieve competitive results. This includes 
cartoon-like images laa or depth maps 1271 ESI Eg EH- For homogeneous 
diffusion inpainting and piecewise almost constant data, choosing data near 
edges can be justified by the theory of Belhachmi et al. jB]- It suggests to 
select the mask density as an increasing function of the Laplacian magnitude. 
Contour-based approaches also beneht from the fact that encoding of contour 
data is relatively inexpensive: One can nse e.g. chain codes, and a smooth 
intensity variation along a contour allows subsampling without substantial 
quality degradations; see e.g. [50] . 

Related ideas are also advocated in compnter graphics for image editing 
and vectorised image representations Uni [701E0211103]. Moreover, contonr- 
based codecs that rely on inpainting processes with PDF’s have been nsed 
successfully for encoding digital elevation maps |1EI 11211 Ild8 
tion is available in terms of level lines. 


where informa- 


Point-based Features In order to end up with more compact image repre¬ 
sentations, it appears tempting to consider point-based featnres instead of 
contour-based ones. Also here several results can be found in the literature. 

Johansen et al. nn and Kanters et al. have performed research on 
reconstructing images from their top points in scale-space. Lillholm et al. |ES| 
present more general discnssions on how to reconstrnct an image from a 
snitable set of feature points and their derivatives (local jet). More recently, 
also reconstrnction attempts are described that are based on SIFT features 
pss] or local binary descriptors [33] . 

So far, all image reconstrnctions from isolated featnre points that allow 
some perceptual interpretations or have shown their merits in other applica¬ 
tions such as image matching do not offer a quality level that is sufficient for 
compression applications. It appears that these features are too sparse while 
lacking optimality. 

2.5 Fast Algorithms and Real-Time Aspects 

In real-time applications, the efficiency of a codec becomes a central criterion. 
Here one shonld distingnish between real-time capabilities dnring encoding 
and real-time decoding. Often the latter one is more important: For instance, 
when a company encodes a movie, it is acceptable if the creation of an 
optimised file takes many hours, as long as the customer is able to decode it 
in real-time. 

On the encoding side, the approach of Belhachmi et al. jS] is algorith¬ 
mically sufficiently simple to allow real-time performance: It compntes the 
Laplacian magnitnde of a Ganssian-smoothed image and converts the re¬ 
sult to a impainting binary mask by means of a halftoning method snch as 
Floyd-Steinberg dithering [3T]. Also the codec of Mainberger et al. [50] is of 
comparable computational simplicity. Most other approaches are not real-time 









capable during encoding since they apply a sophisticated optimisation of data 
and parameters. However, also for data optimisation, substantial algorithmic 
accelerations have been achieved recently isaoroi]. 

On the decoder side, no time-consuming optimisation steps are required, 
and the main computational effort for PDE-based codecs consists of solving an 
inpainting problem. Thus, real-time decoding is possible if one uses appropriate 
algorithms and exploits the capabilities of modern hardware. In 2007, Kbstler 
et al. eg presented real-time algorithms for the subdivision approach of Galic 
et al. [S3]. With multigrid methods for homogeneous diffusion inpainting or 
lagged diffusivity multilevel approaches for anisotropic diffusion inpainting, 
they could process more than 25 greyscale images per second of size 320 x 240 
on a Sony PlayStation 3. Also Mainberger et al. ra proposed a multigrid 
approach for their edge-based homogeneous diffusion codec. On a PC CPU 
they reported runtimes around 6 frames per second for decoding a 256 x 256 
colour image. This is about 6 times higher than JPEG 2000 and 24 times 
higher than JPEG. For linear inpainting problems with very sparse data, 
algorithms based on discrete Green’s functions can be an interesting alternative 
to multigrid methods [EB]- Recently, Peter et al. [106] managed to decode 
25 greyscale images per second with VGA resolution 640 x 480 by means of 
anisotropic diffusion inpainting on a PC architecture with GPU support. This 
was accomplished with so-called FED schemes [33] that are well-suited for 
parallel implementations. 


2.6 Hybrid Image Compression Methods 

The usefulness of embedding inpainting concepts into existing transform- 
based image compression methods is studied in a number of publications 
[571 Hill 11301 11391 I14:0j . Their goal is to benefit simultaneously from the 
advantages of inpainting methods and transform-based codecs. While PDE- 
based inpainting approaches often perform better at edges, transform-based 
codecs are favoured inside the regions, in particular if they are dominated 
by texture. By construction, such hybrid methods restrict themselves to the 
main application fields of both paradigms rather than pushing inpainting 
ideas to the limit. 

An example of a hybrid approach that remains entirely within the inpaint¬ 
ing framework is nog. It combines EED-based inpainting with the sparse, 
exemplar-based appoach of Facciolo et al. m- This results in a more faithful 
recovery of textured regions than a pure EED-based method. 

Chan and Zhou [23] have proposed a hybrid method that modihes wavelet- 
based image compression by a TV regularisation of the wavelet coefficients. 
More recently, Moinard et al. m have used TV inpainting to predict DCT 
coefficients in a JPEG-based codec. 














2.7 Modifications, Extensions, and Applications 


A progressive mode option is a useful feature of a codec. It allows to encode 
and transmit data in a coarse-to-fine way. Thus, in the decoding phase, the 
representation can be subsequently rehned while reading the data stream. 
In |117) . two progressive modes have been suggested for the EED-based 
codec, and it has been shown that for high compression rates, their quality is 
competitive to JPEG and JPEG 2000. 

By construction, PDE-based codecs are well-suited for encoding specihc 
regions of interest with higher precision. All one has to do is to increase the 
density of the inpainting data in the region of interest. For more details, we 
refer to |106j . 

Since a large fraction of modern imagery consists of colour image data, 
codecs should be able to handle colour images efficiently. While for most PDE 
approaches, extensions to vector- and even matrix-valued data exist, these 
modihcations are not necessarily optimal for compression applications. As 
a remedy, in jlU7] an extension of the EED-based codec to colour images 
is proposed that uses the YCbCr representation. The perceptually relevant 
luma channel is stored with fairly high accuracy, while the chroma channels 
are encoded with very sparse data. In the reconstruction phase the chroma 
inpainting is guided by the structural information from the luma channel. 

Extending PDE-based codecs from 2-D images to three-dimensional data 
sets does not create severe difficulties. Most PDF’s have natural extensions 
to higher dimensions, and also other concepts generalise in a natural way: For 
instance a rectangular subdivision in 2D is replaced by a cuboidal subdivision 
in 3D |118j . Because of the richer neighbourhood structure, the redundancy 
in higher dimensional data allows to achieve a higher compression rate for 
the same quality as a lower dimensional codec. 

PDE-based compression strategies have also been investigated for encoding 
contours ma and surfaces pins]. These publications follow the philosophy 
of PDE-based image compression, but replace the Laplacian by the contour 
curvature or the Laplace-Beltrami operator. 


Obviously one can apply any method for compressing 3D data also to 
video coding, if one models a video or parts of it as a spatiotemporal data 
block. Other video coding approaches have been suggested that use inpainting 
concepts within the H.264/MPEG-4 AVG video coding standard jH] ESI IS7j . 
In |119) . the authors have implemented a video compression system that 
combines PDE-based image compression with pose tracking. 


Cryptographic applications are studied in [22], where the authors combine 
PDE-based compression with steganographic concepts in order to hide one 
image in another one, or parts of an image in other parts. 














2.8 Relations to Other Methods 


PDE-based codecs that select the inpainting mask with triangular or rectan¬ 
gular subdivisions have structural similarities to piecewise polynomial image 
approximations based on adaptive triangulations or quadtrees. Extensive 
research has been performed on these approximations; see ini Eni ESI SOI El 
17511771 IH21 ESI I122[ I124[ I125j and the references therein. Often such approaches 
use linear polynomials within each triangle or rectangle. In this case, one 
may also interpret them as a solution of the Laplace equation with Dirich- 
let boundary data obtained from a linear interpolation of the vertex data. 
Thus, they can be seen as specihc localised PDE-based codecs. Alternative 
interpretations can be given in terms of hnite element approximations. In the 
one-dimensional case, the linear spline approximation is even fully equivalent 
to homogeneous diffusion inpainting. 

Inpainting with linear differential operators allows also an analytical 
representation of its solution in terms of the Green’s function of the operator. 
This has been used in to relate linear PDE-based inpainting to sparsity 
concepts: Discrete Green’s functions serve as atoms in a dictionary that gives 
a sparse representation of the inpainting solution. In the one-dimensional 
case, discrete analytic derivations are presented in |ll()j . 

On the other hand, continuous Green’s functions are also used as radial 
basis functions in scattered data interpolation |2U]. Moreover, some radial basis 
functions can be seen as rotationally invariant multidimensional extensions 
of spline interpolation. This establishes a connection between PDE-based 
inpainting and image representations in terms of radial basis functions and 
splines, such as [H [Ml SHI 1129] . 


3 Inpainting with Homogeneous Diffusion 

In this section, we will briefly present the homogeneous diffusion reconstruction 
method (also known as Laplace interpolation) that lies at the basis of our 
approach and will be used for most of the results presented here. Homogeneous 
diffusion is among the simplest inpainting processes that one can consider. 
This makes it suitable for theoretical investigations. Nevertheless, one should 
emphasise that even such a simple method can yield very good results if the 
interpolation data is chosen in an appropriate way; see e.g. [SI EEllHIl EnH- 
Let / : i7 —)■ M be a smooth function on some bounded domain i? C 
with a sufficiently regular boundary dfl. Throughout this work, we will 
restrict ourselves to the case n = 1 (ID signals) and n = 2 (greyscale images), 
although many results will also be valid for arbitrary n ^ 1. Moreover, let us 
assume that there exists a set of known data ^ i7. Homogeneous diffusion 








inpainting considers the following partial differential eqnation with mixed 
boundary conditions. 


Au = 0 

on i? \ ilx, 

(1) 

u = f 

on Qk, 

(2) 

dnU = 0 

on df2 \ dfiKi 

(3) 


where dnU denotes the derivative of u in outer normal direction. We assume 
that both boundary sets OQk and dQ \ OQk are non-empty. Equations of 
this type are commonly referred to as mixed boundary value problems and 
sometimes also as Zaremba’s problem named after Stanislaw Zaremba, who 
studied such equations already in 1910 dn]. The existence and uniqueness of 
solutions has been extensively analysed during the last century. Showing that 
([^-(|^ is indeed solvable is by no means trivial. Generally, one can either 
show the existence of solutions in very weak settings or one has to impose 
strong regularity conditions on the domain. The references [3 ES] discuss 
the solvability in a general way. In [HS] it is shown that a Holder continuous 
solution exists if the data is sufficiently regular. In [IS], the author discusses 
the regularity of solutions on Lipschitz domains. A more general existence 
theory for solutions is given in [201 • Further investigations on mixed boundary 
value problems can also be found in [2Sl El] • A particularly easy case is the 
ID setting, where the solution can obviously be expressed using piecewise 
linear splines interpolating data on dflx- 

For convenience we introduce the conhdence function c which states 
whether a point is known or not: 


c {x) 


1 for a; G i7j^, 

0 for a; G i? \ f^K- 


Then it is possible to write 0-0 as 


c(a;) {u{x)—f{x)) — (1 —c(a;)) Au{x) = 0 on i7, 
dnU = Q ondQ\dQK- 


(4) 

(5) 

( 6 ) 


For most parts of this text we will prefer this formulation, as it is more 
comfortable to work with in the discrete setting, which can be obtained as 
follows. Let J := {1,..., A^} be the set of indices enumerating the discrete 
sample positions, and K J the subset of indices of known samples. Then 
we can express the discrete version of / as a vector / = {fi,..., and the 
corresponding solution as a vector u G The binary mask c G where 
Ci := 1 ii i E K and q := 0 otherwise, indicates the positions of the Dirichlet 
boundary data. At last, the Laplacian A is discretised by standard means of 
hnite differences [HS]- Hence, a straightforward discretisation of (|^-([^ on a 
regular grid yields 


C {u-f) - {I-C)Au = 0 


( 7 ) 



where I is the identity matrix, C := diag (c) is a diagonal matrix with the 
components of c as its entries, and A = (ajj) is a symmetric N x N matrix, 
describing the discrete Laplace operator A with homogeneons Nenmann 
boundary conditions on df2 \ OQk- Its entries are given by 






hj 

te{a:,2y} jeA/ip) ^ 

0 


{j eAfeii)), 

(j =*), 
(else) , 


( 8 ) 


where A/f(i) are the neighbours of pixel i in f'-direction, and he is the corre¬ 
sponding grid size. By a simple reordering of the terms, ([^ can be rewritten 
as the following linear system: 


{C-{I-C)A)u = Cf. (9) 

= :M 

It has been shown in [90] that this linear system of equations has a unique 
solution and that it can be solved efficiently with bidirectional multigrid 
methods. 


4 Optimisation Strategies in ID 


The choice of the position of the Dirichlet boundary data has a strong influence 
on the quality of the reconstruction. In order to gain some analytical insight on 
this problem, we restrict ourselves in this section to the ID continuous setting 
and consider in Section 4T how to hnd the optimal positions of the mask 
points c. Optimising position and value of the Dirichlet data simultaneously 
will be treated in Section 14.21 


4.1 Optimal Knots for Interpolating Convex Functions 

In this section, we assume that / ; [a, 6] —)■ M is always a strictly convex 
and continuously differentiable function. Our goal is to hnd a distribution 
of -|- 1 knot sites {cj}(^Q in the interval [a, b] such that the interpolation 
error with piecewise linear splines becomes minimal in the Li norm. For ID 
formulations, this is equivalent to determining the Dirichlet boundary data 
in 0-0 such that the solution u gives the best possible reconstruction to / 
in the Li sense. In concrete terms, this means that we seek iV -|- 1 positions 
inside the interval [a, b] with 


a =: Co < Cl < C2 < ... < Ctv-i < Cn ■= b 


( 10 ) 







and a piecewise linear spline L {cj}^Qj interpolating / at the positions q, 
such that the error 


E ({cJilo) — \l {x] {cJ^o) - fix) 


dx 


(11) 


becomes minimal. This optimisation problem is also called free knot problem 
and has been studied for more than fifty years. We refer to [SH ES] for 
similar considerations as in our work and to dn EH EE Eg and the references 
therein for more general approaches. Note that it is quite common to relax 
the interpolation condition and to generalise the problem to explicitly allow 
approximating functions. Further details on interpolation and approximation 
techniques can be found in (11^! - Alternative ways to optimise linear spline 
interpolation are discussed e.g. in nm. 


For technical reasons, the knots Cq and ctv in (10) are fixed at the boundary 


of the considered interval. The choice of the Li norm is especially attractive 
in this case: Due to the convexity of /, the integrand 

L [x- - fi.x) 

is nonnegative for all x in [a, h\. Thus, we can simply omit the absolute value 
in O' This observation simplifies the derivation of optimality conditions 
below. Furthermore, the requirement that the linear spline L must coincide 
with / at the knot sites q allows us to state the interpolating function in an 
analytic form: If x G [cj,Cj+i], then 

fjCi+l) - fj Cj) ^ 

Q+i Cj 

Note that requiring that the c* are distinct is necessary to avoid a division by 
0. A straightforward computation gives 

E ({ci}*=o) = / {d}*=o) - fi.x)) dx 


L (x] {ci}i 


N 

=0 


■{x - Cj) + f(Ci). 


( 12 ) 


1 N-1 

= 9 II (D+1 - Ci) if (q+i) + / id)) 

^ i=0 


- [ fix)dx. 

J a 


(13) 


This expression also corresponds to the error of the composite trapezoidal rule 
for the numerical integration of / with non-equidistant integration intervals. 

Equation (13) will be our starting point for developing an algorithm to 
determine the optimal knot sites. However, before we will do so, we present a 
result which shows the difficulty of the free knot optimisation problem. 


Proposition 1. If the function / : [a, 6] —)■ M is strictly convex and twice 
continuously differentiable, then the energy 0 is convex in {cj}^Q for 3 
knots (i.e. N = 2). In general, it is not convex for any other number of knots 
larger than 3 (i.e. N >2). 








Proof. In the case of three knots, we only have one free variable, and it follows 
from (13) that the error is given by 


E (ci) = ^ |^(ci - a) (/ (a) + / (ci)) 

+ {b- Cl) (/ (b) + / (ci))^ - ^ f{x) dx. (14) 


Further, the second derivative of E is given by 

h — o 3^ 

^ E (ci) = ^/(ci) >0 Vci e [a, b]. (15) 

In order to demonstrate that the error function is nonconvex for a higher 
number of interpolation points, it suffices to provide a counterexample. Let us 
consider the function f{x) = exp [x) on the interval [—15,15] as well as the 
two knot sets Ui = { — 15,10.65,14.65,15} and U 2 = { —15, —1.2,12.5,15}. If 
E were convex, then it must also be convex along the line in that connects 
Ui and U 2 (interpreting both knot sets as vectors in M^). However, the plot 
of E{{1 — t)Ui + 1112 ) with t G [0,1] depicted in Fig. [^displays a nonconvex 
behaviour. □ 



Fig. 1. Evolution of the error for interpolating the function exp(a;) with knots along the segment 
with bounds U\ = ( — 15,10.65,14.65,15)^ and U 2 = ( — 15, —1.2,12.5,15)^. It shows a nonconvex 
behaviour. For better readability, the function values have been rescaled by a factor 10“®. 


We remark that the previous proposition does not claim that the energy 
can never be convex for more than three knots. Indeed, for affine functions 
of the form mx + k with real coefficients m and k the energy is identical 0 
for any number of knots, and thus also convex. This shows that even under 
weaker conditions as in the proposition, the energy may be convex. 






A New Algorithm for the Free Knot Problem for Linear Spline 
Interpolation A necessary condition for a minimiser {c*}^q of (13) is 
VF ({c*}^o) “ 0- However, it follows from Proposition that this condition 
is not sufficient. There may exist several global and/or local minima. A simple 
computation leads to the following system of A —1 nonlinear equations in the 
N — 1 unknowns Ci,...,Cn-i- 


f z = i,...,A-i. (16) 

Q+i Q—1 

It should be noted that each knot only depends on its direct neighbonrs. 
Therefore, odd indexed knots only depend on even indexed knots and vice 
versa. Since / is strictly convex, it follows that /' is strictly monotonically in¬ 
creasing. Thus, its inverse exists and is unique at every point of the considered 
interval. This motivates the following iterative scheme. 

Algorithm 1: Spatial Optimisation in ID 
Input: 

A -|- 1, the desired number of knots. 

Initialisation: 

Choose any initial distribntion {c/}(lo with Cq = a and c% = b, e.g. a 
uniform distribntion of the knots on the interval [a, b]. 

Repeat until a fixed point is reached: 

Update even knots for all possible v. 



Update odd knots for all possible r. 

C2i+1 ■ ) k+l _ k+1 ■ UoJ 

\ ^2i+2 ^2i ) 

Output: 

The final knot distribution {c*}^q. 


Observe that the above scheme is similar to a Red-Black Ganfi-Seidel 
scheme for the solution of linear systems: We update the variables iteratively 
and nse newly gained information as soon as it becomes available withont 
interfering with the direct neighbours of the data point. An important issue 
is that the knots Cj are not allowed to fall together. The following proposition 
shows that this cannot happen. 










Proposition 2. The iterative scheme (17)-(18) preserves the ordering of the 
knot positions. Thus, we have e.g. 


-i-l < Ci 


< c: 


i+i 


-'i—l 


< d. 


fc+i 


< c: 


,fc+i 

i+l 


V k, i. 


(19) 


Proof. Since / is differentiable on [cj_i, q+i] for all i, the mean value theorem 
guarantees the existence of a q in (ci_i,Ci+i) such that 


f (Ci) 


/ (c»+i) - / (Cj-l) 
Cj+1 Cj—1 


( 20 ) 


Thus, our iterative scheme must necessarily preserve the order of the knots. 

□ 


The next theorem shows that the iterates from our algorithm monotonically 
decrease the considered energy. 

Theorem 1. If the function f : [a,b] —)• M is strictly convex and twice 
continuously differentiable, the iterates (^{c^}^, 
decrease the Li error (0 in each step, i.e. we have 


obtained in ( |17[ ) and ( |18| ) 


\fk. 


( 21 ) 


Proof. By alternating between the update of the odd and even indexed sites, 
the problem decouples. The new value of only depends on and 
c^_^_i, which are fixed. Therefore, the problem is localised, and we can update 
all the even/odd indexed knots independently of each other. It follows that 
one iteration step is equivalent to finding the optimal such that the 


interpolation error becomes minimal on 


W+1 


for all even/odd i. The 


global error can now be written as the sum of all the errors over the intervals 




and will necessarily decrease when each term of this sum decreases, 
repositionshows that the considered energy is convex for three knots. Thus, 
'VE{{cjyjtf_i) = 0 is not only a necessary, but also a sufficient condition 
for being a minimum on . This means that (17) will not increase 


the error when updating even indexed knots and subsequently, (18) will not 


increase the error while updating the odd numbered sites. Therefore, it follows 
that the overall error cannot increase in an iteration step. □ 


Since the error is bounded from below by 0, we also obtain the following 
result. 

Corollary 1. The sequence [e is convergent. 

Note that the previous statements do not claim the convergence of the 
sequence Since the problem is nonconvex, the global minimum of 

the considered energy is not necessarily unique. In that case, our algorithm 
might alternate between several of the minimisers. These minimisers are. 














from a qualitative point of view, all equivalent, since they yield the same 
error. However, they might not be the global minimiser. Also note that the 
theorem of Bolzano-Weierstrass asserts that contains at least one 

convergent subsequence since all the must necessarily lie in the interval [a, h\. 
Finally, we remark that in our test cases the results were always of very good 
quality. This gives rise to the conjecture that the found knot distributions 
are close to a global minimnm. 


Numerical Experiments Let us now perform experiments with our new 
algorithm. We consider the convex fnnction 

/ (x) = exp (2x — 3) -|- X 

on the interval [—4,4]. Figure depicts the evolution of the error, while 
Fig-i exhibits the resulting distribution of the knots. The experiments were 
done with a randomised initial distribntion of the knots and 5000 iterations. 
Interestingly, the iterates always converged already after very few iterations. 
In accordance with the theory from the previous section, we also note that the 
error is monotonically decreasing, both with respect to the nnmber of knots 
and with the number of iterations. Another interesting observation is the 
influence of an additionally introduced knot on the other knots: Adding further 
interpolation sites has a global influence on all the other knots. Moreover, we 
enconnter a higher knot density in regions with large cnrvatnre than in flat 
regions. This is in agreement with the theory of Belhachmi et al. [S] . 



Fig. 2. Evolution of the Li error as a function of the iterates for different numbers of knots for the 
function / (i) = exp (2x — 3) -I- a; on the interval [—4,4]. Note that the error is decreasing both 
with respect to the number of knots and with respect to the number of iterations. Furthermore, 
the curves suggest a rather fast convergence to a stationary energy value. 
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Fig. 3. Distribution of the knots corresponding to Figure]^ for the converged state of the function 
f{x) = exp(22; — 3) -I- a:. Note that there are almost no knots in flat regions, whereas there is a 
high density in regions with large curvature. Changing the number of knots actually influences the 
position of all the knots. 


4.2 Optimal Knots for Approximating Convex Functions 


In the previous section, we have seen how to optimise the location of the 
interpolation data. A next step would be to investigate how much an op¬ 
timisation of the grey value data at these sites could further improve the 
result. To do so, we no longer require that the function value and the value of 
the interpolant must coincide at the knot locations. Instead, we consider an 
approximation problem and require that the overall reconstruction minimises 
the Li error on the considered domain. As in the previous section, we again 
use piecewise linear splines. Such approximations by means of hrst degree 
splines have a long history, and there exist results for many special cases. 
In jl23j . the best approximation of strictly convex functions in the least 
squares sense has been analysed, while [M] cites general conditions for the 
determination of best approximations of strictly convex functions in the Loo 
sense. Theoretical results can also be found in [73]. In [lUUj . it is shown that 
an optimal approximation of convex functions with splines is not necessarily 
unique, a problem which we already mentioned in the stricter case of convex 
spline interpolation. Finally, |3nnro3] supply algorithms for determining such 
approximations. 

In order to minimise the Li error between a strictly convex function 
/ : [a, 6] —)■ M and a piecewise linear spline in an approximation setting, 
we use an algorithm by Hamideh [HI]. It relies on a classical result from 
approximation theory: 










































Theorem 2. For any function f G C{[a,b]), which is strictly convex in 
[a, 6], the optimal straight line approximation to f in the Li sense on [a, 6] 
interpolates the function at the points 

+ ( 22 ) 

Proof. This result can be verified by direct computation. Alternatively, a 
detailed proof can also be found in dn]. □ 

This gives rise to the following algorithm. 

Algorithm 2: Optimal Approximation in ID 
Input: 

N + 1 the number of desired knots. 

Initialisation: 

Choose an arbitrary distribution of A + 1 knots with Cq := a and c^ := b. 

Compute: 

Repeat these steps until a fixed point is reached. 

1. On each subinterval [cj_i,Cj], define the points 

_ 3cj_i + Q _ Cj_i + 3ci , , 

•- -- ^ 

as well as the corresponding line £i-i passing through these points: 


(24) 

Q,2 — Q,1 

2. Determine for all i the new knot position Cj by intersecting the lines 
£j_i and £i, e.g. solve 

£i.i (q) = £, (a) (25) 

for Cj. 

Output: 

The final knot distribution {ci}fLQ. 


The algorithm of Hamideh is similar to our Algorithm 1 for interpolation: 
Both use iteratively locally optimal solutions to perform a global optimisation. 
In addition, the following properties are shown in m- 

1. The resulting sequence of Li errors (-^({cf }(lo)) convergent. 

2. For alH = 1,..., A — 1, it holds that 


lim inf 

fc^OO 




lim 

k^oo 




> 0 , 

= 0 , 


(26a) 

(26b) 


where is our sequence of knot sets. Thus, two distinct knots 

cannot fall together. 











3. If the unknown optimal spline fulfils certain continuity conditions, one 
can guarantee that the above algorithm convergences towards an optimal 
solution. 

Further analytic results on the optimal knot distribution can also be found in 

pg. 


Numerical Experiments We investigate the knot distribution for differ¬ 
ent number of knots and repeat the experiments of Section 4d^ using now 
the algorithm of Hamideh. Our main interest lies in the performance of a 
combined optimisation over a sequential optimisation of the position and the 
corresponding value of the interpolation data. To this end, we also consider a 
tonal optimisation that adjusts the function values for our free knot result. 
This additional task is carried out as a postprocessing step. Since the position 
of the mask values is optimised in an Li setting, we use the same framework 
for the tonal optimisation, too. Due to the fact that for a fixed set of knots, 
we can express our interpolating spline as a linear combination of hrst degree 
B-Splines, we can express the tonal optimisation as a linear regression task 
with respect to the Li norm. It is well-known that such problems can be 
reduced to linear programs which can efficiently be solved by standard solvers 
from the literature. Our hndings are summarised in Tab. and a visual 
comparison between the obtained mask sets for a set of seven knots is given 
in Fig. 1^ 


Table 1. Error measures for our interpolation algorithm and the approximation algorithm of 
Hamideh for different numbers of mask points applied to the function x i—^ exp(2a; — 3) + a; on the 
interval [—4,4]. For our method we list the error without additional tonal optimisation and with 
additional tonal optimisation. 


A^ + l 

Our method 

Hamideh 

no tonal optim. 

with tonal optim. 

5 

12.501 

4.229 

3.982 

7 

5.134 

1.810 

1.748 

9 

2.785 

0.999 

0.977 


Although the knot distribution in Fig. |^for the approach of Hamideh is 
similar to the one found with interpolation, the corresponding errors in Tab. 
are signfficantly lower, since the reconstruction can adapt much better to the 
original function when compared to the interpolation framework. Nevertheless, 
we also observe a substantial gain of the tonal optimisation. Even though 
we cannot outperform the combined optimisation, we achieve competitive 
results, in particular for larger values of N. 

Our investigations in the ID case show that a careful optimisation of 
the positions (spatial information) and greyvalues (tonal information) of 











Fig. 4. Comparison between the knots found with our method (dark grey disks) and the method 
of Hamideh (light grey squares) for the function exp(2a; — 3) + a; along the interval [—4,4] (dotted 
line). 


the data can lead to significant improvements compared to a pnrely spatial 
tnning. Moreover, a seqnential optimisation is almost as good as a combined 
optimisation of the data. The next step in our strategy will be to adapt these 
ideas to the two-dimensional setting such that we can efficiently apply them 
on discrete image data. 


5 Optimisation Strategies in 2D 

Unfortunately, our interpolation algorithm from Section can hardly be 
used directly on 2D image data. First of all, we would be restricted to 
convex/concave images. Of course, one could always segment an arbitrary 
image into convex and concave regions and treat them separately, but in 
many cases this would lead to heavily oversegmented images and a suboptimal 
global distribution of the data points for the reconstruction. Secondly, the 
solution of (|^-(|^ in higher dimensions cannot be written in terms of simple 
piecewise linear interpolation: To characterise it analytically would require 
more complicated expressions that involve Green’s functions |HS]- Therefore, 
we want to consider other approaches here. Nevertheless, they exploit the 
basic ideas and findings of a spatial and tonal optimisation from the previous 
section. 

For practical reasons, we present a two-step optimisation strategy: First 
we consider an interpolation approach to optimise the spatial data. Afterwards 
we optimise the tonal information at the points obtained in the first step. This 
strategy can be justified by the fact that in the ID case, the obtained knot 
distributions for the interpolation and approximation algorithms were similar. 
For the optimisation of the data sites we investigate two methods: An analytic 
approach proposed in [S] that exploits the theory of shape optimisation, and 
our probabilistic sparsification approach from [HI]. Finally we complement 
our pure interpolation framework with a best approximation scheme that 
incorporates tonal optimisation in our model. 








5.1 Optimising Spatial Data 


Analytic Approach In order to approach the question about the optimal 
data selection, Belhachmi et al. jS] use the mathematical theory of shape 
optimisation. This theory optimises topological properties of given objects. 
Belhachmi et al. seek the optimal shape of the set of Dirichlet boundary 
data in 0-ii. They show that the density of the data points should be 
chosen as an increasing function of the Laplacian magnitude of the original 
image. However, this optimality result returns a continuous density function 
rather than a discrete pixel mask. This yields an additional problem that 
is also discussed in [S], namely, how to obtain the best discrete (binary) 
approximation to a continuous density function. Belhachmi et al. suggest 
the following strategy to obtain a point mask based upon \Af\. First one 
applies a small amount of Gaussian presmoothing with standard deviation 
a to obtain f„. This is a common procedure in image analysis to ensure the 
differentiability of the data. Then one computes the Laplacian magnitude 
|A/o-| and rescales it such that its mean represents the desired point density 
given as fraction d of all pixels. Finally any dithering algorithm that preserves 
the average grey value can be applied to obtain the binary point mask. 

In [S] the classical error diffusion method of Floyd and Steinberg [HI] is 
used. However, we favour the more sophisticated electrostatic halftoning mni 
over simpler dithering approaches, since it has proven to yield very good 
results for discretising a continuous distribution function. Figure]^ shows the 
superiority of electrostatic halftoning over Floyd-Steinberg dithering in terms 
of the mean squared error (MSE) 


MSE(u,/) = ^^(/,-«y. 

I ieJ 


(27) 


where u denotes the reconstruction, / the original image and J the set of 
all pixel indices. Since any dithering method introduces errors, it remains an 
open question if this is the most suitable approach to discretise the continuous 
optimality result. 

The theory of Belhachmi et al. jS] demands the data points to be chosen 
as an increasing function of \ Af\. However, the optimal increasing function 
depends on the details of the underlying model. One option is to use the 
identity function of the Laplacian magnitude, as was done in ED. In the 
present work, we introduce an additional parameter s > 0 and dither \ Af^\^ 
instead. This choice can also be motivated from the original paper [S] and 
allows to tune the density of the selected points in homogeneous regions. The 
complete method, which we call the analytic approach, is summarised below. 





Fig. 5. (a) Original test image trui (256 x 256 pixels), (b) Smoothed Laplacian magnitude of (a) 
using (T = 1 (rescaled and inverted), (c, d) Dithered versions of (b) using Floyd-Steinberg error 
diffusion and electrostatic halftoning, respectively. With (c) and (d) as mask for homogeneous 
diffusion inpainting we obtain an MSE of 138.98 and 101.14, respectively. All the images use 
floating point values in the range from 0 to 255 for the pixels. The discrete masks have a density 
of 4%. 


Algorithm 3: Analytic Approach 
Input: 

Original image /, Gaussian standard deviation a, exponent s, desired 
pixel density d. 

Compute: 

1. Perform Gaussian presmoothing with standard deviation cr: 

2. Gompute 

3. Rescale |A/o-|® to 

d ■ /max |S 

mean(|AM*) ' ' 

where /max is the maximal possible grey value. 

4. Apply electrostatic halftoning to obtain c. 

Output: 

Discrete pixel mask c. 



Fig. 6. (a, c) Original test images waiter (256 x 256 pixels) and peppers (256 x 256 pixels), (b, 
d) Best reconstruction results with 4% of all pixels, using probabilistic sparsification, nonlocal 
pixel exchange and grey value optimisation. The image (b) has an MSE of 12.45, while image (d) 
has an MSE of 25.10. 


In order to evaluate the analytic approach, let us apply it on the test 
image trui (see Fig. [^a)). We aim at a mask pixel density of 4% (d = 0.04) 












of all pixels. The parameters a and s are chosen snch that the MSE of the 
reconstruction becomes minimal. This was achieved with a = 1.6 and s = 0.8. 
For comparison, we also consider two masks with the same amount of pixels: 
a mask with points on a regular grid and a randomly sampled mask. Figure]^ 
shows these two masks and the one created with the analytic approach in 
the hrst column as well as the corresponding reconstructions in the second 
column. We observe that the reconstruction quality highly benehts from a 
dedicated point selection. These results are conhrmed by the Columns 3, 4 
and 5 of Tab. which depicts quantitative results for several test images 
from Fig. [^a), Fig. [^a), and Fig. [^c). 

As already mentioned, the analytic approach is real-time capable if a fast 
dithering method is used. However, rather than on speed, the focus of our 
present work is to maximise the quality of the reconstruction. Therefore, in 
the next subsection we present an alternative algorithm that is slower but 
outperforms the analytic approach in terms of quality. 

Probabilistic Sparsification We have seen that the analytic approach 
offers a clean strategy how to choose optimal spatial data in a continuous 
image. However, due to certain degrees of freedom in the modelling, errors 
caused by the dithering algorithm, and discretisation effects, its results on 
digital images cannot be optimal. As an alternative, we consider now a 
discretise-then-optimise strategy. This way we can directly search for a 
binary-valued mask c by working with the discrete inpainting formulation 
from Q. An immediate consequence of this approach is that there are only 
hnitely many combinations for c. Unfortunately, already for an image of size 
256x256 pixels and a desired pixel density of 4% there are (^ 2 ^ 2 ^) ~ ^ 
possible masks. To tackle this combinatorial problem we suggest a method 
called probabilistic sparsihcation. It uses a greedy strategy to reduce the 
search space. 

Let / be a given discrete image and let r(c, /) be the function that 
computes the solution u of the discrete homogeneous inpainting process 0 
with a mask c: 

r(c, f):=u={C-{I- C)A)”' Cf. (28) 

The goal is to hnd the pixel mask c that selects a given fraction d of all pixels 
and minimises MSE {u, /). 

Starting with a full mask, where every pixel is chosen, probabilistic sparsi¬ 
hcation iteratively removes the least signihcant mask pixels until a desired 
density is reached. More specihcally, we randomly choose a fraction p of 
candidate pixels from the current mask. These pixels are removed from the 
mask, and an inpainting reconstruction is calculated. The signihcance of a 
candidate pixel can then be estimated by computing the local error, i.e. the 
squared grey value difference of the inpainted and original image in this pixel. 


Then we permanently remove the fraction q of the candidates that exhibit the 
smallest local error from the mask, and we insert back again the remaining 
fraction (1 — g) of the candidates. A detailed description of our algorithm is 
given below. 

Algorithm 4: Stochastic Sparsification 

Input: 

Original image /, fraction p of mask pixels used as candidates, fraction 
q of candidate pixels that are removed in each iteration, desired pixel 
density d. 

Initialisation: 

C ;= diag (1,..., 1)"'^ and K := J. 

Compute: 

Do 

1. Choose randomly a candidate set T of p ■ \K\ pixel indices from K. 

2. For alH G T set c* := 0. 

3. Compute u ;= r{c, f). 

4. For alH G T compute the error e* = {ui — fiY- 

5. For all i of the (1 — g) ■ |T| largest values of {e* | i G T} reassign Cj := 1. 

6. Remove the indices i ^ T from K and clear T. 
while \K\ > d ■ \J\. 

Output: 

Pixel mask c, such that Ci = d ■ \J\ 


The larger the parameters p and g are chosen, the faster the algorithm 
converges, since in each step, p ■ q ■ \K\ pixels are removed. After k steps 
there are (1 — pq)^ ■ | J| mask pixels left. Hence, for a density d, the algorithm 
terminates after at most iterations, where [■] denotes the ceiling 

function, giving the smallest integer not less than its argument. 

Because there is a global interdependence between all selected mask 
pixels, probabilistic sparsihcation cannot guarantee to give optimal solutions. 
Therefore, the question arises how the parameters p and g influence the quality 
of the resulting mask. To this end, we run several experiments with different 
p and g. The results are depicted in Tab. Note that we set the candidate 
set as well as the set of pixels that are removed to 1 if p or g would lead 
to sets smaller than one pixel. The optimal p is usually not very large: If 
too many candidates are removed from the mask, the local error does not 
provide enough information to select good pixels to remove permanently. The 
parameter g can usually be chosen as small as possible, i.e. such that only 
one candidate is removed in each iteration. For our test images, this was the 
case for g = 10“®. With larger values for g, the probability that we remove 
important pixels increases. 

Tab. [^illustrates the robustness of the algorithm. Note that the approach is 
not deterministic and thus always returns different results since the candidates 







Table 2. Influence of the parameters p and q of probabilistic sparsiflcation. There have been 
in total 100 runs for each pair (p, q) on the test image trui with desired pixel density d = 0.04. 
Numbers in the table are the mean and standard deviation of the MSE. Again all pixel values he 
in the interval [0, 255]. 


g 


0.01 


0.02 


0.05 


0.1 


0.2 


0.3 


0.4 


10 “® 
10 “® 
10 “^ 
2 • 10 “^ 
5 • 10 “^ 
10 “^ 


103.7 ± 1.88 98.2 ± 1.72 87.9 ± 1.61 77.6 ± 1.40 67.7 ± 1.40 66.1 ± 1.36 70.7 ± 1.76 

103.7 ± 1.77 98.0 ± 1.87 87.7 ± 1.74 77.4 ± 1.33 67.8 ± 1.26 66.3 ± 1.41 70.5 ± 1.67 

103.1 ± 1.69 98.3 ± 1.91 88.9 ± 1.76 81.8 ± 1.68 73.0 ± 1.44 68.9 ± 1.81 69.4 ± 1.84 

103.7 ± 1.99 98.7 ± 1.74 92.6 ± 1.57 85.3 ± 1.71 76.4 ± 1.78 71.4 ± 1.56 70.6 ± 1.83 

104.9 ± 2.19 102.7 ± 1.96 98.1 ± 2.07 91.6 ± 1.82 82.7 ± 2.02 77.1 ± 2.03 74.6 ± 1.79 

110.3 ± 2.36 107.2 ± 2.63 103.7 ± 2.26 97.7 ± 2.14 89.5 ± 2.00 83.9 ± 2.22 80.8 ± 2.24 


are chosen randomly at each iteration. Althongh the obtained masks for 
different seeds differ in most of the selected pixels, we obtain qnalitatively 
comparable resnlts: The standard deviation does not exceed a valne of 2.6 
and is even smaller for optimal values for p and g, when the algorithm is run 
several times. 

In Fig. 1^ the images in the third row show the results of the probabilistic 
sparsiflcation for the test image trui with a mask pixel density of 4% [d = 0.04). 
To optimise the quality, we use p = 0.3 and q = 10“® (cf. Tab. |^. Both the 
visual as well as the quantitative results outperform the ones of the analytic 
approach. This can also be observed for the two test images waiter and 
peppers, as is shown in Tab. The parameters p and q are optimised for each 
image individually. 


Nonlocal Pixel Exchange As we have seen in the previous section, proba¬ 
bilistic sparsiflcation outperforms the analytic approach. Nevertheless, it is 
not guaranteed to And an optimal solution. An obvious drawback of proba¬ 
bilistic sparsiflcation is the fact that due to its greedy nature, once a point 
is removed, it will never be put back into the mask again. Thus, especially 
at later stages, where only few mask pixels are left, important points might 
have been removed, so that we end up in a suboptimal local minimum. We 
now present a method called nonlocal pixel exchange that allows to further 
improve the results of any previously obtained mask, in our case the one from 
probabilistic sparsiflcation. It starts with a sparse, possibly suboptimal mask 
that contains already the desired density d of mask pixels. In each step, it 
randomly selects a set of m non-mask pixels as candidates. The candidate 
that exhibits the largest local error is then exchanged with a randomly chosen 
mask pixel. If the inpainting result with the new mask is worse than before, we 
revert the switch. Otherwise we proceed with the new mask. By construction, 
the nonlocal pixel exchange can only improve the result. This algorithm 
always converges towards an optimal solution in terms of a exchange of two 













pixels. Since we exchange at each iteration the same nnmber of candidate 
pixels it follows that this approach is not eqnivalent to an exhaustive search 
through all possible combinations. Thus, one cannot guarantee convergence 
towards the global minimum. The description below shows the details of the 
algorithm. 

Algorithm 5: Nonlocal Pixel Exchange 

Input: 

Original image /, pixel mask c, size m of candidate set, the set K of pixel 
indices of the mask c. 

Initialisation: 

u := r(c, /) and := c. 

Compute: 

Repeat 

1. Choose randomly m <\K\ pixel indices i from J\K and compute the 
local error e* := {ui — 

2. Exchange step: 

Choose randomly a j G A and set c"®’" := 0. 

For the largest value of e*, set c°®" := 1. 

3. Compute := r(c’^®®, /). 

4. If MSE (w, /) > MSE (w^®*, /) 

u := and c := c“®". 

Update K. 
else 

Reset c°®" := c. 

until no pairs can be found for exchange. 

Output: 

Optimised mask c“". 


As for the previous algorithms, we are interested in an optimal parameter 
selection. Table |3]shows the results for different choices of m when the nonlocal 
pixel exchange is applied to the masks from the previous section (randomly 
selected, regular grid, and analytic approach from Fig. and in addition 
the one we obtained by probabilistic sparsification (also Fig. [^. Technically 
one could always choose m = f, but a noticeable speedup can be obtained 
by choosing a larger m. In our experiments, values around m = 20 result in 
fastest convergence. 

The nonlocal pixel exchange improves the masks from any method we 
have considered so far; see Fig. Especially within the first few iterations we 
achieve significant quality gains. After 500,000 iterations, we reach with all 
three masks an MSE below 45. The best reconstruction with an MSE of 41.92 
is obtained with the mask from probabilistic sparsification. It is depicted in 
Fig. II As for the previous methods. Tab. [^provides also quantitative results 
for the test images waiter and peppers. They support the above observations. 




Mean square error (MSE) 


Table 3. Mean squared error after 500,000 iterations for different values for m, when the nonlocal 
pixel exchange is applied to different masks of the test image trui (see also Fig. [^. The best result 
for each mask is marked in boldface. 


m 

mask - 



1 

5 

10 

20 

30 

40 

50 

100 

randomly selected 

49.88 

46.43 

44.99 

44.78 

45.00 

45.07 

45.25 

48.22 

regular grid 

49.67 

45.79 

45.19 

44.76 

45.16 

45.56 

45.76 

48.33 

analytic approach 

49.19 

45.88 

45.14 

44.70 

45.33 

45.56 

46.13 

49.31 

probabilistic sparsification 

43.72 

42.45 

42.34 

41.92 

41.97 

42.49 

42.19 

43.65 
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Fig. 7. Convergence behaviour for the first 50,000 iterations, when the nonlocal pixel exchange is 
applied to different masks (cf. Figure |^a,d,g)) of the trui test image with optimal parameter m 
(cf. Table 1^. 


























5.2 Optimising Tonal Data 


So far, all our 2D optimisation approaches only propose a solution for the 


spatial optimisation problem. However, the results in Section 4.2 show that 
a tonal optimisation, i.e. an optimisation of the grey values, can be very 
worthwhile. From a data compression point of view, it is important to notice 
that changing the grey values at the chosen data points does not increase the 
amount of data that needs to be stored. The quality improvements, on the 
other hand, can be remarkable. In this section, we present an approach that 
allows us to determine the optimal grey values for any given mask. 

In order to hnd the optimal grey values g for a hxed mask c, we consider 
the following minimisation approach: 


argmin{|/ - r{c,g) p} , 

9 


(29) 


where | ■ | is the standard Euclidean norm, f denotes the original image, and 


r{c,g) the reconstruction from (28). Due to the linearity of r with respect 
to g, this is a linear least squares problem. In our next steps, we analyse its 
well-posedness properties and propose an efficient numerical algorithm. 


Existence and Uniqueness Results Let Cj denote the i-th canonical basis 
vector of i.e. = 1 ii i = j, and 0 otherwise. Then we call r{c,ei) 
the inpainting echo of pixel i. Since r is linear in g we can express the 
reconstruction n as a superposition of its inpainting echoes: 

u = r{c,g) = rlc^Y^gieij = Y.gir{c,ei). (30) 

^ ieJ i£J 


Since r(c, e*) is 0 for Cj = 0 (i.e. for i E J \ K), we can simplify this to a 
summation over K: 

r{c,g) = gir{c,ei). (31) 


i&K 


For our minimisation problem (29), this means that the coefficients gi can be 
chosen arbitrarily if z G J\K. For simplicity, we fix them at 0. The remaining 
gi with i E K can be obtained by considering the least squares problem 


argminjlR^x - /P| 

9K ^ ^ 


(32) 


where gx = {gi)i£K is a vector of size \K\, and B is a \J\ x \K\ matrix that 
contains the vectors {r(c, e*) | i E K} as colnmns. The associated normal 
equations are given by 

B^BgK = B^f. (33) 

By constrnction, the \K\ x \K\ matrix B^B is positive semidehnite. It is, 
however, not obvious that the eigenvalue 0 cannot appear. The theorem below 
excludes such a singular situation. 





Theorem 3. Let K he nonempty. Then the matrix B is invertible, and 
thus the linear system (33) has a unique solution. 


Proof. Mainberger et al. have proven that for a nonempty set K, the 
\J\ X \J\ matrix M = C — {I — C)A is invertible, i.e. exists. Moreover, 
ioT i E K we have 


r(c, Ci) = M 


-1 


Ce, ‘= 


(34) 


This shows that r(c, e*) is the i-th column of M~^. Since M is invertible, 
also is regular. Thus, all column vectors of have to be linearly 

independent. In particular, this implies that also all columns {r{c, ei) \ i E K} 
of the |J| X \K\ matrix B are linearly independent. Therefore, BAB is 
invertible, and the linear system (33) has a unique solution. □ 


Numerical Algorithm To hnd the solution of the minimisation prob¬ 
lem (29), Mainberger et al. IHI] have solved the associated normal equa¬ 


tions (33) iteratively. In particular, they have chosen a randomised GauB- 


Seidel scheme that updates the grey values at the individual mask points 
one after another. Alternatively, one could also employ different iterative 
solvers or solve the system of equations with direct methods such as LU- 
or QR-decompositions [52] ■ In general, however, one has to state that typ¬ 
ical methods which rely on inpainting echoes suffer from a relatively high 
computational cost to obtain the individual echoes. Although one can pre¬ 
compute and reuse them for the subsequent iteration steps, they still need 
to be computed at least once. This is particularly inefficient when a large 
amount of mask points is present. Moreover, precomputing inpainting echoes 
leads to higher memory requirements. The more recent strategies in [221 El] 
avoid the direct computation of the inpainting echoes. Instead, the original 


minimisation problem (29) is solved directly with the help of primal-dual 


methods or related sophisticated optimisation strategies. This gives more 
efficient algorithms for tonal optimisation. 

In the following, we propose an alternative approach to hnd optimal tonal 
data. It is based on an accerated gradient descent strategy that allows an 
efficient grey value optimisation. Similar to [291IM] , we consider the original 


minimisation problem (29) directly. Thus, we also avoid computing inpainting 


echoes. This leads to a reduced runtime as well as to low memory requirements. 
Below we hrst explain the classical gradient method with exact line search. 
Afterwards we present a novel variant that benehts from an accelation with a 
fast explicit diffusion scheme in the sense of Grewenig et al. 

Our goal is to minimise the objective function 


r(g) = ^ \r{c,g) - /I 


(35) 










Starting with an initialisation = Cf, a gradient descent scheme minimises 
this energy E by iteratively updating the current grey values for A; > 0 as 


gk+i = gi^ - aVE{g^ 


(36) 


with a step size a and the gradient ^E{g^) depending on the iterates g'^. 
Denoting the inpainting solution at iteration step k by i.e. := r(^c,g^'^, 
the gradient can be written as 


VEig>^) = JT (u^ - f 


(37) 


where J is the Jacobian of r(^c,g^^ with respect to the second component. 


With the definition of r[c,g^) from (28), we obtain 


= ({C - {I - C)A)~^ C 


= C{C-A{I~C)) 


-1 


(38) 


where we have exploited the symmetries of the matrices C and A. Computing 
the iterates and the gradient E{g^) means to solve a linear system of 
equations for each of them. This can be done efficiently with a bidirectional 
multigrid solver as is suggested in PO] . 

The main parameter that we have to specify is the step size a. As one 
possibility, it can be optimised to yield the largest possible decay of the energy 
in each step. This comes down to the least squares problem 

argminjl/ - ric.g'"-aV E{g’^))\^\. (39) 

Exploiting the linearity of r in the second component allows to obtain the 
minimiser in closed form: 


(/-u‘)Tr(c,Vi5(9‘)) 

“ |r(c,VB(g‘))|2 ■ * ' 

This strategy is also known as exact line search. It guarantees that the 
sequence {g^)k converges to the minimum of the energy 113]. As stopping 
criterion, we consider the relative norm of the gradient, which should approach 
zero at the optimum. In other words, we stop as soon as 

\VE{g^)\^ <e\VE{g^)\^ (41) 

with some small number e > 0. An algorithmic overview of the classical 
gradient descent method with exact line search is shown below. It serves as 
our baseline method. 




Algorithm 6: Grey Value Optimisation with Exact Line Search 


Input: 

Original image /, pixel mask c. 

Initialisation: 

g^:=Cf, k:=0. 

Compute: 

Repeat for fc > 0 

1. Compute the gradient 

WE{g‘‘):= jT(r(c./)-/). 

2. Determine the step size a with Equation ( [40| ). 

3. Update the tonal data: 

,= g^-aVE{g^). 


until the stopping criterion (41) is fulhlled. 
Output: 

Optimised grey values g. 


(42) 


(43) 


In order to speed up the gradient descent approach, we propose an accel¬ 
erated algorithm based on a so-called fast explicit diffusion (FED) scheme. 
First applications of FED to image processing problems go back to Grewenig 
et al. [5H|. FED can be used to speed up any explicit diffusion-like algorithm 
that involves a symmetric matrix. While classical explicit schemes employ a 
constant time step size that has to satisfy a restrictive stability limit, EED 
schemes involve cycles of time step sizes where up to 50 % of them can violate 
this stability limit. Nevertheless, at the end of each cycle, stability in the 
Euclidean norm is achieved. In contrast to classical explicit schemes that 
reach a stopping time of order 0{M) in M steps, EED schemes with cycle 
length M progress to 0{M'^). This allows a very substantial acceleration. 

Since the gradient descent scheme can be seen as an explicit scheme with 
a symmetric matrix, EED is applicable: If a* denotes a hxed step size for 


which (36) is stable in the Euclidean norm, one replaces it by the cyclically 


varying step sizes 


1 


CK,- = a 


2 COs2 ( TT ■ ^ 


iM+2 ) 




(44) 


For large cycle lengths M, one should permute the order of the step sizes to 
tame rounding errors; see jl33] for more details on this and an exhaustive 
explanation of the FED framework in general. The FED cycles should be iter¬ 


ated until the stopping criterion (41) is fulfilled. A related cyclic optimisation 


strategy has also been investigated in jl2()j . 













In order to determine the individual step sizes within each cycle, we have 
to hnd a step size a* that is within the stability limit. The following result is 
well-known in optimisation theory If E{g) is continuous and its gradient 
is Lipschitz continuous, i.e. there is a constant L sucht that 

\VE{gi)-VE{g2)\<L-\gi-g2\ (45) 


for all gi and g 2 , then the gradient descent scheme is stable for all step sizes 
a* fulhlling 

0<a* <y. (46) 


It is straightforward to verify that in our case L can be chosen as the squared 
spectral norm of the inpainting matrix D ;= M~^C, i.e. 


L = \DY := p{D'D) 


(47) 


where p{D^D) denotes the spectral radius of the symmetric matrix D. 
One possibility to estimate the spectral radius is to use Gershgorin’s circle 
theorem. However, this may give a too pessimistic estimate. Instead, we 
propose to use the power method to determine the maximum eigenvalue of 

D; see e.g. |2]. The convergence of this method turns out to be relatively 
fast, such that one obtains already a reasonable estimate of the spectral radius 
after 5 iterations. 

Although it may appear tempting to choose a* close to the stability limit 
I;, this can result in a suboptimal convergence speed, since high frequent error 
components are damped too slowly. Our experiments suggest that a good 
choice for a* is two third of the stability limit: 


a = 


3L' 


(48) 


Similar strategies are also common e.g. in multigrid approaches that use a 
damped Jacobi method with damping factor | as a baseline solver na. 

Below we give an algorithmic overview over all steps to perform tonal 
optimisation with FED-accelerated gradient descent: 

Algorithm 7: Grey Value Optimisation with FED 
Input: 

Original image /, pixel mask c, FED cycle length M. 

Initialisation: 

g^.= Cf, k-.= 0. 

Compute: 


1. Estimate L in (47) with the power method. 


2. Determine the EED time steps oq, • • •, «m-i according to (44) with 
a* = T^. If necessary, permute them. 

3. Repeat for A: > 0 





(a) := 

(b) For i = 0,..., M — 1 do 

i. Compute the gradient 

:= -/). 

ii. Update the tonal data: 


_ gk,M 


nntil the stopping criterion (41) is fnlhlled. 
Output: 

Optimised grey valnes g. 


(49) 


(50) 


Since the grey value optimisation problem is strictly convex, all convergent 
algorithms yield the same minimiser and are therefore qualitatively equivalent. 
They only differ by their run times. The following experiment gives an 
impression of realistic run times of both gradient descent algorithms for 
greyvalue optimisation. As before, we consider the inpainting problem with 
the test image trui (256 x 256 pixels) and 4 % mask density. The stopping 
parameter for our iterations was set to e := 0.001. With a C implementation 
on a desktop PC with Intel Xeon processor (3.2GHz), the exact line search 
algorithm requires 458 seconds to perform 262 iterations. A corresponding 
FED algorithm with a* = 0.01 needs only 77 seconds to compute 4 cycles of 
length M = 15. In comparison, a CPU implementation of the primal-dual 
approach of Hoeltgen and Weickert [61] requires for the same problem a 
run time of 346 seconds. This illustrates the favourable performance of the 
FED-accelerated gradient descent method. 

As the FED algorithm is based on an explicit scheme, it is also well-suited 
for implementations on parallel hardware such as GPUs. This can lead to 
very substantial additional accelerations. 

Qualitative Evaluation In order to evaluate the capabilities of the grey 
value optimisation, we apply it to the masks obtained so far for the test 
image trui. In all cases this results in a clear improvement of the recon¬ 
struction quality, both visually and in terms of their MSE; see Eig. as 
well as Tab. This confirms also our impressions from the ID scenario 
in Section |4.2[ Especially suboptimal masks benefit a lot from grey value 
optimisation: It can compensate for many deficiencies that are caused by an 
inferior spatial selection strategy. This becomes particularly apparent when 
considering the result for the random mask or the mask on the regular grid. 
It is remarkable how much gain in quality is possible by simply choosing 
different grey values. However, also the reconstruction we obtain with the 
best mask using probabilistic sparsification and nonlocal pixel exchange can 






be improved substantially: The MSE is reduced from 41.92 to 27.24. Similar 
improvements can also be observed for the images waiter and peppers] see 
Tab. 1^ The best reconstructions are depicted in Fig. 


Table 4. Comparison of the reconstruction error (MSE) with 4% of all pixels for different test 
images and different inpainting data. The parameters were applied to both approaches, with and 
without grey value optimisation (GVO). The nonlocal pixel exchange was performed with 5 • 10® 
iterations for each experiment. 


image 

GVO 

randomly 

selected 

regular 

grid 


analytic 

approach 

probabilistic 

sparsification 

nonlocal 
pixel exchange 

trui 

no 

yes 

273.10 

151.25 

181.72 

101.62 

84.04 

42.00 

(cr = 1.6, 
s = 0.80) 

66.11 

36.04 

(p = 0.3, 
q = 10-q 

41.92 

27.24 

(m = 20) 

waiter 

no 

yes 

297.07 

155.50 

184.00 

91.97 

39.85 

20.19 

(cr = 1.5, 
s = 1.00) 

32.96 

19.24 

(p = 0.2, 
q = 10-q 

18.37 

12.45 

(m — 30) 

peppers 

no 

yes 

278.61 

156.15 

185.04 

104.41 

70.05 

43.77 

(cr = 1.5, 
s = 0.95) 

44.85 

28.58 

(p = 0.1, 

q = 10-q 

29.63 

25.10 

(m — 30) 


6 Extensions to Other Inpainting Operators 

We have seen that optimising the interpolation data allows to obtain high 
qnality reconstrnctions with only 4% of all pixels. These resnlts are also 
remarkable in view of the fact that so far we have used a very simple interpo¬ 
lation operator: the Laplacian. It is unlikely that it offers the best performance. 
In this section we investigate the question if the algorithms of Section can be 
used with, or extended to more advanced inpainting operators. We focus on 
two representative operators that have proven their good performance in the 
context of PDE-based image compression with sparse data [23 E31 IMl IllSj : 
the biharmonic operator and the EED operator. 

A straightforward extension of the Laplacian is the biharmonic operator, 
i.e. we replace Au in (|^ by —A'^u. Using it for interpolation comes down to 
thin plate spline interpolation [13], a rotationally invariant multidimensional 
generalisation of cubic spline interpolation. Compared to the Laplace operator, 
it yields a smoother solution u around the interpolation data, since its Green’s 
function is twice differentiable. This avoids the typical singularities that distort 
the visual quality with homogeneous diffusion inpainting. These artifacts are 
cansed by the logarithmic singnlarities of the Green’s fnnction of the two- 
dimensional Laplacian. On the other hand, biharmonic inpainting is prone to 
over- and undershoots, i.e. the values of u may leave the range of the inpainting 
data in K. This cannot happen for homogeneous diffusion inpainting which 
fulhls a maximum-minimum principle. Nevertheless, a number of evaluations 
show that biharmonic inpainting can offer a better reconstrnction qnality 
than homogeneous diffusion inpainting [23 [S3 Ell 1118] . 

Secondly we consider an anisotropic nonlinear diffusion operator, namely 
edge enhancing-diffusion (EED). Originally it has been introduced for image 
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MSE: 181.72 MSE: 101.62 



MSE: 84.04 MSE: 42.00 



MSE: 66.11 



MSE: 36.04 



MSE: 41.92 
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Fig. 8. Evaluation of different inpainting data using 4% of all pixels. Left column: Different 
masks obtained by using a regular grid, the analytic approach (s = 0.80, o = 1.6), a probabilistic 
sparsification {p = 0.3, q — 10“®), and with an additional nonlocal pixel exchange (m = 20, 5 • 10® 
iterations) after the previous probabilistic sparsification. Middle column: Reconstructions with 
homogeneous diffusion inpainting with the masks from the first column. Right column: Same as 
in the middle column, but with optimal tonal data. 













denoising USB. and its application in image compression goes back to Galic 
et al. |n3]- Using EED means that we replace Au in ([^ by div{D{'Vua)'Vu). 
The positive dehnite matrix D{'Vua) is the so-called diffusion tensor. It steers 
the diffusion process by its eigenvectors and eigenvalues. They depend on 
the gradient of a Gaussian-smoothed version Ua- of the image u, where a 
denotes the standard deviation of the Gaussian. The hrst eigenvector of D is 
chosen to be orthogonal to and the corresponding eigenvalue is fixed 

at 1. This gives full diffusion / inpainting along image edges. In contrast, 
the second eigenvector is chosen to be parallel to and its eigenvalue is 

a decreasing function of the local image contrast |VM(j|. Thus, one reduces 
diffusion / inpainting across high contrast edges. For image compression, 
one usually chooses the diffusivity of Gharbonnier et al. |2S], which is given 
by (1 -|- |Vmo-P/A^) The parameter A > 0 allows to steer the contrast 
dependence. 

Image inpainting with EED can reconstruct edges in high quality, even 
when the specified data is sparse |118j . This explains why EED has become 
one of the best performing operators for PDE-based image compression 
ISSHSHEH]. Moreover, for second order elliptic differential operators such as 
EED, the continuous theory guarantees a maximum-minimum principle nag. 
Experiments show that in contrast to homogeneoues diffusion inpainting, 
EED does not suffer from singularities jll8j . 

What are the changes in our framework, when we replace the Laplacian by 
the biharmonic or the EED operator? If we discretise the biharmonic operator 
with central finite differences, we have to exchange the matrix A in ([^ by 
another constant matrix, but the structure of this equation remains the same 
as for homogeneous diffusion inpainting: It is a linear system of equations in 
the unknown vector u. For EED, however, the discrete differential operator 
A depends nonlinearly on the reconstruction u. Thus, the resulting system 
of equations becomes nonlinear: 

C{u-f)-{I-C)A{u)u = 0. (51) 

While this nonlinearity does not affect our spatial optimisation, we will see 
that it makes the tonal optimisation more difficult. 

6.1 Optimising Spatial Data 

Let us consider the spatial optimisation hrst. The theory behind the analytic 
approach is strongly based on homogeneous diffusion, and no extensions to 
more sophisticated inpainting operators have been discussed in |H]. Therefore, 
we do not consider this approach in this section. 

In contrast, the probabilistic approach can be used without any restrictions 
and changes for both the biharmonic operator as well as for EED. As before, 
we search for the best parameters of the spatial optimisation methods. In all 









our experiments, we fix the EED parameters to A = 0.8 and a = 0.7, since 
this gives reconstructions of high quality. 

Results for both operators and methods in comparison to homogeneous 
diffusion inpainting are shown in Tab. Interestingly, the biharmonic operator 
does not achieve better results than the homogeneous one in the case of pure 
probabilistic sparsihcation without nonlocal pixel exchange. Our explanation 
for this observation is as follows: Biharmonic inpainting has a positive effect 
of increased smoothness around data points, and a negative effect by over- 
and undershoots in the inpainting domain. The latter one is ignored in the 
sparsihcation decision, since the point selection is based on a purely local 
error measurement. Thus, for very sparse data, over- and undershoots can 
become particularly detrimental on global error measures such as the MSE. 
In our experiment, the minimum and maximum values of the reconstruction 
he around —59 and 346, respectively. This is far beyond the range of the 
original image which is given by [56, 214]. With an additional nonlocal pixel 
exchange, we can overcome this problem, since it uses the MSE as criterion to 
redistribute the mask pixels to more favourable locations. As a consequence, 
the grey value range of the reconstruction shrinks to the interval [47, 248], 
and the MSE falls from 79.26 to 20.89. This is far better than the MSE of 
41.92 that we achieve for homogeneous diffusion inpainting. 

However, EED with probabilistic sparsihcation gives far superior results. 
Already without nonlocal pixel exchange, we obtain an MSE of 24.20. Post¬ 
processing with nonlocal pixel exchange reduces the error to only 12.62. 

Figures [^a) and (c) depict the masks obtained with probabilistic spar¬ 
sihcation combined with the nonlocal pixel exchange for both operators. 
Comparing them with the one from homogeneous diffusion inpainting (see 
Fig.|§, we observe interesting differences: Homogeneous diffusion inpainting 
requires more pixels near edges to represent these discontinuities well. Thus, 
for a specihed pixel number, it has less pixels to approximate the interior of 
the regions. This explains why biharmonic or EED inpainting can achieve 
lower errors. 


Table 5. Comparison of the reconstruction error (MSE) with 4% of all pixels for different 
inpainting operators and different data optimisation algorithms. The parameters were applied to 
both approaches, with and without grey value optimisation (GVO). The nonlocal pixel exchange 
was performed with 5 • 10® iterations for each experiment. 


operator 

reg. grid 

probab. sparsif. 

nonl. pixel exch. 

with GVO 

homogeneous 

181.72 

66.11 {p = 0.3, q = 10^®) 

41.92 (m = 20) 

27.24 

biharmonic 

107.96 

79.26 (p = 0.005, q = 10^®) 

20.89 (m = 10) 

16.73 

EED (A = 0.8, CT = 0.7) 

102.85 

24.20 (p = 0.05, q = 10^®) 

12.62 (m = 30) 

10.79 










Fig. 9. Best mask and reconstruction for the biharmonic operator (a, b) and EED (c, d) for the 
test image trui with 4 % of all pixels, using probabilistic sparsification, nonlocal pixel exchange, 
and tonal optimisation. The image (b) has an MSE of 16.73, while image (d) has an MSE of 10.79. 
Parameters have been chosen as in Table H] 


6.2 Optimising Tonal Data 


We have seen that for homogeneous diffusion inpainting, an additional tonal 
optimisation yields signihcant improvements in the reconstruction quality. 
Thus, we should also investigate if it is benehcial for biharmonic and EED 
inpainting. 

In the case of biharmonic inpainting, our tonal optimisation strategy from 


Subsection |5.2| carries over literally, since we are still facing a linear least 
squares problem: For a hxed inpainting mask, the reconstruction depends 
linearly of the specihed grey values. Thus, all we have to do is to exchange 
the discrete differential operator for homogeneous diffusion inpainting by its 
biharmonic counterpart. Also our algorithms such as the FED-accelerated 
gradient descent remain applicable. The hnal results for biharmonic inpaint¬ 
ing with spatial and tonal optimisation are listed in Tab. and the best 
reconstruction is depicted in Fig. [^b). As one can see, tonal optimisation 
allows to reduce the MSE from 20.89 to 16.73. 

Since our tonal optimisation methods are tailored towards linear least 
squares formulations, specihc challenges arise when we want to extend them 
to EED-based inpainting. The nonlinearity of the EED inpainting scheme 


prevents closed form expressions such as (37) and (38). This is caused by the 
fact that the matrix A is now a function of the inpainted image u^, which 
itself depends on the grey value data in the specihed pixels. Although this 
concatenated mapping is formally smooth, one should keep in mind that EED 
has the ability to create edge-like structures. This means that in practice the 
problem is fairly ill-conditioned: Small local changes in may have a strong 
global impact on on A{u^), and on the Jacobian of the error function 


(35). Moreover, there is no guarantee anymore that the tonal optimisation 


problem is strictly convex. Thus, it may have many local minimisers. It is 
therefore not surprising that different algorithms and even different parameter 
settings within the same algorithm may end up in different minimisers. Since 
it is difficult to design practically leasable algorithms that guarantee to hnd 
the global minimiser, we focus on transparent and conceptually simple local 












optimisation strategies such as gradient descent. We have done a number of 
experiments with several variants of gradient descent approaches for tonal 
optimisation in EED-based inpainting. Below we describe the method that 
has yielded the best results in terms of reconstruction quality. 

We suggest to modify the gradient descent approach for (35) as follows. 
While we keep the basic structure of (36) and (37), we lack a closed form 
solution of type (38) for the Jacobian J that is now an unknown function of 
the evolving grey value data g^. As a remedy, we approximate the Jacobian 
by numerical differentiation: 




r(c, r/Cj), - r(c,gr* 
V 


(52) 


where i denotes the pixel index, j is the index of an individual mask point, 
and the parameter rj > 0 quantihes a small grey value perturbation at the 
mask point j. As before, ej is the canonical basis vector with a unit impulse 
in pixel j. Note that the grey values at all non-mask points do not have any 
influence on the inpainting result. Thus, the Jacobian does not have to be 
computed for those locations. 

In contrast to our linear grey value optimisation strategies, we abstain 
from adapting the gradient descent step size a by means of exact line search 
or cyclic FED-based variations: We have experienced that the high sensitivity 
of the Jacobian w.r.t. suggests to keep the gradient descent step size a 
fairly small to capture this dynamics in an adequate way. On the other hand, 
too small values for a can also result in a convergence towards a fairly poor 
local minimiser. Thus, in practice, one hxes a to some small value. 

Similar considerations apply to the parameter g in (52): The nonlinear 
dynamics of the Jacobian suggests to use small values for rj in order to 
approximate the derivative well. On the other side, if rj is too small, numerical 
problems can arise, since both the numerator and the denominator in (52) 
approach zero. 

Table shows how an appropriate selection of the parameters a and p can 
be used to optimise the reconstruction quality for EED-based inpainting. We 
observe that in our test scenario the resulting tonal optimisation step allows 
to reduce the MSE from 12.62 to 10.79. The corresponding reconstruction is 
depiced in Fig. [^d). Tab. shows that the MSE of 10.79 obtained for EED is 
far better than the errors for homogeneous diffusion inpainting (MSE=27.24) 
and for biharmonic inpainting (MSE=16.73). To the best of our knowledge, 
such a reconstuction quality has never been achieved before for PDE-based 
inpainting with a mask density of only 4%. 


7 Summary and Conclusions 

The contributions of our work are twofold: On the one hand we gave a 
comprehensive survey on the state-of-the-art in PDE-based image compression. 










Table 6. MSE obtained after a tonal optimisation for the EED inpainting approach with the final 
mask from Fig. The parameter a denotes the hxed step size in each gradient descent iteration, 
and rj is the step size used for the computation of the derivatives by means of finite differences. 


a 

10 “^ 10 "^ 


1.0 10.86 10.79 
1.5 10.93 10.80 
2.0 10.88 10.80 
3.0 10.88 10.81 


It enables the readers to obtain an overview of the achievements that have 
been made in this held and guide them to the relevant literature. 

On the other hand, we were focusing on one key problem in PDE-based 
image compression: the spatial and tonal data optimisation. Here we aimed at 
obtaining insights into the full potential of the quality gains, without imposing 
any constraints on the type of inpainting operator, the computational costs, 
or the coding costs for the optimised data. Our strategy was to start with 
more restricted settings that allow a deeper understanding of the problem, 
and to extend our hndings to more general scenarios afterwards. 

In the ID setting, we restricted ourselves to homogeneous diffusion im- 
painting of strictly convex functions. For the resulting free knot problem for 
linear spline interpolation, we came up with a new algorithm and discussed 
optimal approximation results. We showed the nonconvexity of the problem 
for more than three knots, and we saw that a splitting into a spatial optimi¬ 
sation step followed by a tonal one does not deteriorate the reconstruction 
quality substantially. 

This motivated us to use such a splitting strategy also for the 2D setting 
and without restrictions on the convexity or concavity of the image data. For 
spatial data optimisation with homogeneous diffusion inpainting, we studied 
a probabilistic sparsihcation approach with nonlocal pixel exchange in detail. 
For the subsequent tonal optimisation step we established the existence of a 
unique solution. We showed that it can be found efficiently with a gradient 
descent approach which is accelerated by a fast explicit diffusion (FED) 
strategy. 

In our investigations we were aiming at fairly generic algorithms that 
can also be applied to other inpainting operators such as biharmonic in¬ 
painting and inpainting based on edge-enhancing anisotropic diffusion (EED). 
Data optimisation for EED-based inpainting allowed us to come up with 
reconstructions of hitherto unparalleled quality. 

Our framework for data optimisation permits to specify the desired mask 
density a priori. This is a conceptual advantage over approaches that have 
to tune a regularisation parameter in order to influence the density which 






results a posteriori [23 E3]. In practice this means that the latter approaches 
may have to run the algorithm many times. 

While our results show the large potential of PDE-based inpainting, it 
should be emphasised that data optimisation is a key problem, bnt not the 
only one that must be solved for building well-performing codecs: For example, 
both the location and the grey valnes of the data must be stored efficiently, 
and some real-world applications reqnire very fast algorithms. This poses 
additional constraints and challenges that will be addressed in our forthcoming 
pnblications. 
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